The Flex Pre-Processor

Published in Software

Rationale

The original reason for creating the Flex(ible) Pre-Processor was to simulate the pre-processing that is built into many compilers. Often structures such as comments and conditional includes are permitted anywhere in a source code file, which breaks conformity with the language's grammar (or parser) if not removed. Many of these can be dealt with using fairly simple text processing.

To support rapid prototyping of language processing engines I created a utility that can run a series text operations against text input based on an instructions file, to avoid the need a special purpose program just to pre-process the source code before it is passed to the next stage of processing (generally the parser).

Usage

Flex Pre-Processor is a Windows command line application based on the .Net Framework 4 Client Profile libraries. Input may be sourced from either the console or a file, and similary output may be sent to the console or a file. Instructions must always be provided in an external text file.

Instructions File

The instructions file is a simple tab-delimited text file carrying one instruction on each row, to be executed sequentially from top to bottom. The number of tab delimited columns required for each instruction varies

regexreplace

Searches for a regular expression (arg 1) and replaces each match with the given text (arg 2). Replacement text may be empty but there must be at least two tab characters on the line to mark the parameters.

Within replacement text "\r" and "\n" will be replace with a carriage return and line feed respectively. Regular expression capture groups will may also be substituted back in to the replacement text, starting at $1.

regexReplace search_pattern replacement_text [extra parms ignored, handy for comments]

E.g.
regexReplace    \n\s*(\S*)  \n$1        Remove extra padding from beginning of lines

regexreplacefirst

Searches for a regular expression (arg 1) and replaces the first match only with the given text (arg 2). Replacement text may be empty but there must be at least two tab characters on the line to mark the parameters.

Within replacement text "\r" and "\n" will be replace with a carriage return and line feed respectively. Regular expression capture groups will may also be substituted back in to the replacement text, starting at $1.

regexReplaceFirst search_pattern replacement_text [extra parms ignored, handy for comments]

E.g.
regexReplaceFirst   ID(\w), "ID$1",     Place first ID* field in quote marks

regexreplacefirstbyparms

Searches for a regular expression (arg 1) and replaces the first unique occurrence of the first capture group (parenthesised part) with the given text (arg 2). This is handy if something may be delcared once and referred to many times later in the text (i.e. source code) using similar syntax, and you need to highlight the declarations.

regexreplacefirstbyparms search_text replacement_text  [extra parms ignored, handy for comments]

E.g.
regexReplaceFirstByParms    \n\s*IMAGE\s+(([a-z]|[A-Z]|\.|_)+)  \nIMAGED $1         Replace IMAGE token with IMAGED and trim leading space

replace

Does a plain string replacement of the search text with the replacement text. The replacement text is not modified at all.

replace search_text replacement_text  [extra parms ignored, handy for comments]

E.g.
replace OldCo   NewCo       Update company name

uppercasenonquoted

Converts elements of the source text that do not match the quoted text regular expression to upper case. Designed so that the 'code' parts of a source file can be uppercased while leaving string literals alone.

upperCaseNonQuoted  quoted_text_expression  [extra parms ignored, handy for comments]

E.g.
upperCaseNonQuoted  '.*?'       Uppercase all non-quoted strings

lowercasenonquoted

Converts elements of the source text that do not match the quoted text regular expression to lower case. Designed so that the 'code' parts of a source file can be lowercased while leaving string literals alone.

lowerCaseNonQuoted  quoted_text_expression  [extra parms ignored, handy for comments]

E.g.
lowerCaseNonQuoted  '.*?'       Lowercase all non-quoted strings

insertsourcefileonregex

Calls an external program to retrieve text that has to be included at the position of the match. The external program must be located in the current working directory or on the system PATH, accept the include name as its first argument and return the content to include on the standard console output.

insertSourceFileonRegex include_name    external_program

E.g.
insertSourceFileonRegex \n\s*#include\s*(\S*?)\s*?\n    GetSource.exe

insertsourcefileonregexonce

The same as insertsourcefileonregex, but will only perform the external call for each include_name once. All other instances are ignored.

insertSourceFileonRegexOnce include_name    external_program

E.g.
insertSourceFileonRegexOnce \n\s*#includeOnce\s*(\S*?)\s*?\n    GetSource.exe

Download

The binary file for this tool can be found here. I may post the source code on Github in future, and then update this post with the link.